Anonymizing Unstructured Data

نویسندگان

  • Rajeev Motwani
  • Shubha U. Nabar
چکیده

In this paper we consider the problem of anonymizing datasets in which each individual is associated with a set of items that constitute private information about the individual. Illustrative datasets include market-basket datasets and search engine query logs. We formalize the notion of k-anonymity for set-valued data as a variant of the k-anonymity model for traditional relational datasets. We define an optimization problem that arises from this definition of anonymity and provide a constant factor approximation algorithm for the same. We evaluate our algorithms on the America Online query log dataset.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Anonymizing Unstructured Data to Prevent Privacy Leaks during Data Mining

In this information age, data becomes more and more important. A lot of data is stored in the cloud, this means that you are not really in control of the data and it might be anywhere. This leads to possible data leaks and therefore leads to privacy leaks. Recently the Dutch government has introduced a new law that makes it obligatory to report any data leaks that involve privacy sensitive data...

متن کامل

Anonimytext: Anonimization of Unstructured Documents

The anonymization of unstructured texts is nowadays a task of great importance in several text mining applications. Medical records anonymization is needed both to preserve personal health information privacy and enable further data mining efforts. The described ANONYMITEXT system is designed to de identify sensible data from unstructured documents. It has been applied to Spanish clinical notes...

متن کامل

Compromising Anonymity Using Packet Spinning

We present a novel attack targeting anonymizing systems. The attack involves placing a malicious relay node inside an anonymizing system and keeping legitimate nodes “busy.” We achieve this by creating circular circuits and injecting fraudulent packets, crafted in a way that will make them spin an arbitrary number of times inside our artificial loops. At the same time we inject a small number o...

متن کامل

M-Partition Privacy Scheme to Anonymizing Set-Valued Data

In distributed databases there is an increasing need for sharing data that contain personal information. The existing system presented collaborative data publishing problem for anonymizing horizontally partitioned data at multiple data providers. M-privacy guarantees that anonymized data satisfies a given privacy constraint against any group of up to m colluding data providers. The heuristic al...

متن کامل

Anonymization of Set-Valued Data via Top-Down, Local Generalization

Set-valued data, in which a set of values are associated with an individual, is common in databases ranging from market basket data, to medical databases of patients’ symptoms and behaviors, to query engine search logs. Anonymizing this data is important if we are to reconcile the conflicting demands arising from the desire to release the data for study and the desire to protect the privacy of ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • CoRR

دوره abs/0810.5582  شماره 

صفحات  -

تاریخ انتشار 2008